Aiming at the problem of artificial artifacts due to phase disorder in frequency-domain speech enhancement algorithms, which limits the denoising performance and decreases the speech quality, a speech enhancement algorithm based on Multi-Scale Ladder-type Time-Frequency Conformer Generative Adversarial Network (MSLTF-CMGAN) was proposed. Taking the real part, imaginary part and magnitude spectrum of the speech spectrogram as input, the generator first learned the local and global feature dependencies between temporal and frequency domains by using time-frequency Conformer at multiple scales. Secondly, the Mask Decoder branch was used to learn the amplitude mask, and the Complex Decoder branch was directly used to learn the clean spectrogram, and the outputs of the two decoder branches were fused to obtain the reconstructed speech. Finally, the metric discriminator was used to judge the scores of speech evaluation metrics, and high-quality speech was generated by the generator through minimax training. Comparison experiments with various types of speech enhancement models were conducted on the public dataset VoiceBank+Demand by subjective evaluation Mean Opinion Score (MOS) and objective evaluation metrics.Experimental results show that compared with current state-of-the-art speech enhancement method CMGAN (Comformer-based MetricGAN), MSLTF-CMGAN improves MOS prediction of the signal distortion (CSIG) and MOS predictor of intrusiveness of background noise (CBAK) by 0.04 and 0.07 respectively, even though its Perceptual Evaluation of Speech Quality (PESQ) and MOS prediction of the overall effect (COVL) are slightly lower than that of CMGAN, it still outperforms other comparison models in several subjective and objective speech evaluation metrics.
The properties of the measured objects in 3D profile using the grating projection are more and more complex, there are a large number of splits in the extracted refinement grating stripes, and the refinement stripe encoding is very difficult. An automatic coding algorithm based on color structure light was proposed. The paper designed a new model of color structure light, introduced its design principle and implemented a new automatic stripe coding algorithm. First, the algorithm extracted the refinement grating stripe with color information from the color structure grating. Then, orderly encoded the refined stripes of each color by judging the best connected domain. Finally, the article got the stripe coding of the total image through combined coding by using the periodicity of grating model. The simulation experiment results show that the model design of color structure light is simple, the automatic coding algorithm of stripe has high accuracy and the error is decreased to 10 percent. The ideal 3D points cloud data model can be reconstructed through the strip coded data.
The cluster number is not generally set by K-means clustering algorithm beforehand, and artificial initial clustering number easily leads to the problem of unstable clustering results. A high-efficient algorithm for determining the K-means optimal clustering number was presented. The algorithm got the upper bound of the number of clustering search range through stratified sample data and designed a new kind of effective clustering indicator to evaluate the clustering degree of similarity between and within class after clustering. Thus the optimal number of clusters was obtained in the search range of the clusters number. The simulation results show that the algorithm can obtain the optimal clustering number fast and accurately, and the dataset clustering effect is good.
Most of the variants of Graph Cut algorithm do not impose any shape constraints on the segmentations, rendering it difficult to obtain semantic valid segmentation results. As for pedestrian segmentation, this difficulty leads to the non-human shape of the segmented object. An improved Graph Cut algorithm combining shape priors and discriminatively learned appearance model was proposed in this paper to segment pedestrians in static images. In this approach, a large number of real pedestrian silhouettes were used to encode the a'priori shape of pedestrians, and a hierarchical model of pedestrian template was built to reduce the matching time, which would hopefully bias the segmentation results to be humanlike. A discriminative appearance model of the pedestrian was also proposed in this paper to better distinguish persons from the background. The experimental results verify the improved performance of this approach.